A symbolic approach to automatic multiword term structuring

نویسندگان

  • Eric SanJuan
  • James Dowdall
  • Fidelia Ibekwe-Sanjuan
  • Fabio Rinaldi
چکیده

This paper presents a three-level structuring of multiword terms (MWTs) basing on lexical inclusion, WordNet similarity and a clustering approach. Term clustering by automatic data analysis methods offers an interesting way of organizing a domain’s knowledge structures, useful for several information-oriented tasks like science and technology watch, textmining, computer-assisted ontology population, Question Answering(Q-A). This paper explores how this three-level term structuring brings to light the knowledge structures from a corpus of genomics and compares the mapping of the domain topics against a hand-built ontology (the GENIA ontology). Ways of integrating the results into a Q-A system are discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Inference of Base Forms for Multiword Terms in Lithuanian

This paper reports on a specific problem of automatic terminology extraction in Lithuanian – base form inference. While the process of lemmatisation is properly carried out by existing tools, problems arise with normalizing multiword terms. It can be described as the discrepancy between the base form (i. e. lemma) of a term and the sequence of the base forms of constituent lexical items within ...

متن کامل

Yet Another Ranking Function for Automatic Multiword Term Extraction

Term extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve some problems related (but not completely) to t...

متن کامل

Automatic Lemmatisation of Lithuanian MWEs

This article presents a study of lemmatisation of flexible multiword expressions in Lithuanian. An approach based on syntactic analysis designed for multiword term lemmatisation was adapted for a broader range of MWEs taken from the Dictionary of Lithuanian Nominal Phrases. In the present analysis, the main lemmatisation errors are identified and some improvements are proposed. It shows that au...

متن کامل

Unsupervised Construction of a Lexicon and a Repository of Variation Patterns for Arabic Modal Multiword Expressions

We present an unsupervised approach to build a lexicon of Arabic Modal Multiword Expressions (AM-MWEs) and a repository of their variation patterns. These novel resources are likely to boost the automatic identification and extraction of AM-MWEs.

متن کامل

Use of Multiword Terms and Query Expansion for Interactive Information Retrieval

This paper reports our participation in the INEX 2008 AdHoc Retrieval track. We investigated the effect of multiword terms on retrieval effectiveness in an interactive query expansion (IQE) framework. The IQE approach is compared to a state-of-the-art IR engine (in this case Indri) implementing a bag-of-word query and document representation, coupled with pseudo-relevance feedback (automatic qu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2005